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A scoring rule is a loss function measuring the quality of a quoted 
probability distribution Q for a random variable X, in the light of 
the realized outcome x of X; it is proper if the expected score, un- 
der any distribution P for X, is minimized by quoting Q = P. Using 
the fact that any differentiable proper scoring rule on a finite sample 
space X is the gradient of a concave homogeneous function, we con- 
sider when such a rule can be local in the sense of depending only on 
the probabilities quoted for points in a nominated neighborhood of 
x. Under mild conditions, we characterize such a proper local scor- 
ing rule in terms of a collection of homogeneous functions on the 
cliques of an undirected graph on the space X. A useful property of 
such rules is that the quoted distribution Q need only be known up 
to a scale factor. Examples of the use of such scoring rules include 
Besag's pseudo-likelihood and Hyvarinen's method of ratio matching. 

1. Introduction. Let X be a finite set, let A be the set of real vectors 
a = (a x : x G X) with each a x > 0, and let V = {p G A : ^2 x p x = 1} be the 
set of such vectors corresponding to strictly positive probability distributions 
on X . We will use P for the distribution determined by p (similarly Q for q), 
and generally do not distinguish between them. For a G A, C C X we write 
ac := (a x :x G C), and similarly pc- 

Consider a game between Forecaster and Nature, where Forecaster quotes 
a distribution Q G V as representing his uncertainty about a quantity X 
taking values in X, and Nature then reveals X = x. A scoring rule [see, e.g., 
Dawid (1986)] is a function S : X xV ^-M. The interpretation is that S(x, Q) 
measures the loss suffered by Forecaster for the above outcome of the game. 

For P G V we define S(P,Q) := ^2 x p x S(x,Q), the expected score when 
Forecaster quotes Q, and Nature generates X from P. The scoring rule S 
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is proper if always S(P,Q) > S(P,P), so that it is always optimal to quote 
a distribution Q matching the real uncertainty P; S is strictly proper if 
furthermore S(P, Q) > S{P, P) when Q^P. 

The generalized entropy function, or uncertainty function, H : V — y R, as- 
sociated with a proper scoring rule 5 is given by H{P) := S(P,P). Then H 
is a concave function on P. We also introduce the associated divergence or 
discrepancy function d: V x P -> R, where d(P, Q) := S(P, Q) - H(P). Then 
d(P, Q) > 0, with equality if Q = P (and only in this case if 5 is strictly 
proper). 

As well as being of intrinsic interest, proper scoring rules have a range of 
applications. For example, if Q = {Qq : 9 6 0} is a smooth parametric sta- 
tistical model, we might estimate 6, based on a random sample (sci, . . . 
by minimizing the empirical discrepancy, d(P n ,Qg), where P n is the empiri- 
cal distribution of the data. This is equivalent to minimizing Y17=i S{xi,Qe). 
Implementing this by setting the derivative of this criterion to will yield an 
unbiased estimating equation [Dawid and Lauritzen (2005), Dawid (2007)], 
from which we, under suitable smoothness assumptions, can deduce statisti- 
cal properties of the associated estimator such as consistency and asymptotic 
normality. For the well-known logarithmic score S(x,Q) = —logq x this pro- 
cedure leads to the maximum likelihood estimator, but it is of interest to 
use other scoring rules and estimators, for example, because they can lead 
to greatly simplified calculations. We illustrate this in Section 4 for capture- 
recapture experiments and pseudo-likelihood estimation for image analysis; 
see also Czado, Gneiting and Held (2009) for a range of other applications 
of proper scoring rules to discrete data. 

Parry, Dawid and Lauritzen (2012) investigated when a proper scoring 
rule with X an interval on the real line can be local in the sense that S(x, Q) 
depends on Q only through the value f{x) of the density / of Q and the 
values f^ k \x) of a finite number of the derivatives of / at the realized 
outcome x and Ehm and Gneiting (2012) studied rules with k = 2 in further 
detail. It was shown in Parry, Dawid and Lauritzen (2012) that any proper 
local scoring rule is a linear combination of the logarithmic score and what 
was termed a key local scoring rule; and that any such key local scoring rule 
is ^-homogeneous in the sense that S(x,Q) can be evaluated when / is only 
known up to proportionality. The results in this article for discrete sample 
spaces parallel these. However, in the case of discrete X we have to redefine 
locality using a neighborhood structure on the space X , and use somewhat 
different techniques of proof. 

The organization of the paper is as follows. In Section 2 we review results 
from Hendrickson and Buehler (1971) characterizing proper scoring rules as 
supergradients of concave functions. 

In Section 3 we formally define what it means for a scoring rule to be local 
with respect to a neighborhood system and show that if the homogeneous 
extension of the scoring rule is local, the neighborhood system must be deter- 
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mined by an undirected graph. We also describe a general additive construc- 
tion of local scoring rules. Section 4 gives examples of local scoring rules and 
their use. In Section 5 we proceed in parallel to Parry, Dawid and Lauritzen 
(2012) by a variational argument to characterize local scoring rules as solu- 
tions to a key differential equation and, under an additional condition on the 
neighborhood system, we show that such local scoring rules can be expanded 
in additive terms indexed by complete subsets of an undirected graph. 

2. Homogeneous proper scoring rules. Further analysis is facilitated by 
recasting the problem in terms of homogeneous functions and using the 
fundamental characterization of proper scoring rules given by McCarthy 
(1956) and Hendrickson and Buehler (1971). 

2.1. Homogeneous functions. A function f:A — >M is called (positive) 
homogeneous of order h, or h-homogeneous, if 

(1) /(Act) = X h f(a) for all A > 0. 

In this paper we shall only need homogeneity of orders and 1. If / is 
differentiable, (1) will hold if and only if / satisfies Euler's equation: 

(2) £«.|£-v. 

x x 

Even when / is not differentiable, in some circumstances we can reinter- 
pret (2) so as to continue to apply. 

Definition 2.1. A vector V/(q) G A is a supergradient to / at a if, 
for all (3 G A, 

/(a) + (/3-a) T V/(a)>/(/3). 

When / is differentiable at a and has a supergradient V/(q) there, it 
must coincide with the gradient vector (df/da x :x G X). Lemma 2.1 below 
and Corollary 2.2 extend Euler's equation (2) to homogeneous functions 
with a supergradient and are equivalent to Theorem 2.1 of Hendrickson and 
Buehler (1971) and subsequent remarks, so we omit the proofs here. 

Lemma 2.1. Suppose f is h-homogeneous, and has a supergradient V/(cc) 
at ex. Then 

(3) a T V/(a) =hf(et). 

Corollary 2.2. Suppose f is 1 -homogeneous. Then S is a supergradi- 
ent of f at a if and only if 

/3 T S>/(/3) 
for all (3 G A, with equality when (3 = a. 
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By the supporting hyperplane theorem, a function / is concave on A if and 
only if it has a supergradient at each a £ A (not necessarily unique if / is not 
differentiable at a). A supergradient function V/ associates a specific choice 
of supergradient V/(a) with each point a £ A. If / is /i-homogeneous, (3) 
holds at each a € A for any choice of supergradient function V/. 

2.2. Homogeneous scoring rules. Clearly, any scoring rule S(x, P) can 
readily be extended to A by defining S(x,a) := S(x,a/a + ), where a + := 
Ylycx a y So extended, S(x, a) is a O-homogeneous function of a for every x 
and we say that S(x, a) is a O-homogeneous scoring rule. 

McCarthy (1956) states that a O-homogeneous scoring rule S is proper 
if and only if it can be expressed as the supergradient of a concave 1- 
homogeneous function H:A — >-K. This is formally proved in Hendrickson 
and Buehler (1971) and stated below in Theorems 2.3 and 2.4. 

Theorem 2.3. Suppose H : A— >M is concave and 1 -homogeneous. LetX7H 
be a supergradient of H , and, for x G X , p £ V , define S(x,p) to be the x- 
component of the vector S(p) := Vif(p). Then S is a proper scoring rule, 
and the associated entropy at p is H(p). 

We note that the definition S(a) := VH(a) can be used to extend the 
domain of S from X x V to X x A. The supergradient function VH can be 
taken to be O-homogeneous and then S{x,a) is a O-homogeneous function 
of a. 

For the converse direction, starting with a scoring rule S defined on X x V, 
we let S(x, a) denote its O-homogeneous extension as described above and 
let S(q) be the vector with x-component S(x,ot). 

Theorem 2.4. Suppose that S(x,a) is a O-homogeneous proper scor- 
ing rule. Define H(a) := a T S(a). Then H is 1-homogeneous and concave, 
and S(a) is a supergradient of H at ex. 

As a consequence we obtain the following symmetry relation for the partial 
derivatives of any O-homogeneous proper scoring rule. 

Corollary 2.5. If S is a O-homogeneous proper scoring rule, and S(x, a) 
is continuously differentiable on A for each x 6 X , then 

dS(x,a) = dS(y,a) 
doty da x 

Proof. In this case H(a) = o; T S is differentiable on A, so S(q) is its 
gradient. It immediately follows that H is twice continuously differentiable. 
Then (4) follows from d 2 H/da y da x = d 2 H/da x da y . □ 
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Example 2.1. Examples of proper scoring rules are the Brier score S(x, 
p) = IIpII 2 ~~ 2p x [Brier (1950)], where |[p|| 2 = J2 x Pxi with 1-homogeneous en- 
tropy function H(a) = — ||a|| 2 /a + and the spherical score S(x,p) = —p x /\\p\\ 
with 1-homogeneous entropy function H(a) = — \\a\\ [Good (1971), Dawid 
(2007)]. 

We shall say that the entropy function H is regular if it is continuous on A 
and its closure clH as a concave function [Rockafellar (1970), page 52] is 
finite on the closed cone A = {a :a x >0,xE X}. In other words, H is regular 
if it can be extended by continuity to have finite values for all a G A. 

Clearly, since H(a) = ^ x a x S(x,a), H{a) is certainly regular if S(x,P) 
is bounded in P for each x. Both the Brier score and the spherical score 
satisfy this requirement and have regular entropy functions, but in general 
boundedness is not necessary for regularity. 

3. Local scoring rules. In general, as for the Brier and spherical score, 
S(x,P) will depend on every element of P. We are interested in cases where 
this is not so. 

3.1. Locality. Suppose we specify, for each x G X, a set N x C X (the 
neighborhood of x), containing x, and require that the proper scoring 
rule S(x,P) be expressible as a function of x and the restriction p^ x of p 
to N x : 

S(x,P) = s{x,p Nx ). 

We say that such a scoring rule is M -local, where M = {N x : x G X} is the 
neighborhood system. Similarly, its 0-homogeneous extension is said to be 
A/ -local if S(x,a) = s(x, c*n x )- Note this property is strictly stronger; see 
Section 3.2 below. 

Suppose that the 0-homogeneous extension of S is continuously differen- 
tiable and A/"-local. We then obtain from (4) that, if x £ N y , 

dS(x,a) = dS(y,a) 
da y da x 

so that without loss of generality we can also require y ^ N x . Hence, for 
scoring rules with A/"-local 0-homogeneous extensions we can assume that 
the neighborhood relation is symmetric and so determined by an undirected 
graph Q so that y G N x if and only if x = y or x-y, that is, x and y are 
neighbors in Q. We then also say that the scoring rule and its extension are 
Q -local. 

We note that a scoring rule with a Q -local 0-homogeneous extension only 
depends on P through its conditional distribution P\n x of X given X G N x , 
that is, it satisfies 

(5) S(x,P) = s(x,p lNx ). 
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In particular, only knowledge of p up to a constant factor is necessary to 
calculate S(x,P). Conversely, the 0-homogeneous extension of any scoring 
rule satisfying (5) is Q-local. 

3.2. Logarithmic score. The simplest case of a local scoring rule is 
where S(x,P) is a function only of x and p x , and is thus £o-local for the 
totally disconnected graph Qq. It is well known [Bernardo (1979)] that (for 
j^X > 2) a scoring rule with this property is proper (and is then strictly 
proper) if and only if it has the form 

(6) S(x, P) = a(x) — Xlnp x 

with A > 0. For a{x) = this is known as the log-score. 

As described previously, any scoring rule has a 0-homogeneous extension 
which in this case is a{x) — A In a x + A lna+; however, the extension depends, 
not just on a x , but on a y for all y € X. Hence, although the scoring rule 
itself is local, its 0-homogenous extension is not, reflected in the fact that 
knowledge of p up to a constant factor is not sufficient for calculating the 
log-score. In fact, there is no nontrivial proper scoring rule with a Qq-\ocs1 

0- homogeneous extension. 

Note that the (Shannon) entropy function H{a) = — A ^ a x log (a x /a+) 
for the log-score is regular although the log-score itself is unbounded. 

3.3. Additive scoring rules. Here we describe a simple way of construct- 
ing a 0-homogeneous local scoring rule. Let B be a collection of subsets 
of X , define Ab '■= {olb ■ ot € £>}, and let Hb ■ Ab — > M be a concave and 

1- homogeneous function of olb — and thus also, by extension of its domain, 
of a. Let VHb G Ab be a 0-homogeneous supergradient of Hb on Ab', this 
is also a 0-homogeneous supergradient on the extended domain A, if we 
define its components for x £ B as 0. By the results of Section 2, this deter- 
mines a proper scoring rule Sb(x,ol). Moreover, Sb vanishes if x £ B, and 
otherwise depends on a only through olb- We now let 

(7) S(x, a)=^ S B (x, a B ), H(ct) = £ H B (a B ) 

BeB BeB 

and these define a proper and 0-homogeneous scoring rule and its associ- 
ated 1-homogeneous entropy function. We shall say that a scoring rule and 
entropy function satisfying (7) are B-additive. When each Hb is a differ- 
entiable function of olb, the gradient of H will be the unique associated 
scoring rule S of form (7). 

We note that if we define an undirected graph Q by x—y if and only if 
x,y € B for some B € B, we have that the (0-homogenous extension of) any 
B-additive scoring rule is Q-local. If C denotes the collection of all cliques 
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of Q, that is, all maximal complete subsets of X, we can collect terms ap- 
propriately and rewrite the expansions in (7) above as 

(8) S(x,a) = ^2s c {x,a c ) 
and 

(9) H(a) = J2hc(a C ). 

cec 

We say that a scoring rule S and entropy function H having the forms of (8) 
and (9) are Q -additive. 

We remark that the above constructions can also be applied straightfor- 
wardly to the case of a countably infinite sample space X, so long as every 
set B G B is finite. 

We shall in Section 5 give conditions for the converse to hold, that is, 
conditions for a (/-local scoring rule to be (/-additive as above, without 
necessarily demanding each term of the decomposition (9) to be concave or 
1-homogeneous. 

4. Examples. This section gives some examples of (/-additive and (/-local 
scoring rules. 

4.1. Local scoring rules for integer-valued outcomes. We first consider 
cases where the outcomes are nonnegative integers. 

Example 4.1 (Pair scoring rule). Suppose X = {0, 1,2, . . .}, and let the 
graph Q have edges between successive integers. The cliques are just the 
pairs, C x := {x, x + 1} (x = 0,1, . . .), and a concave, 1-homogeneous local en- 
tropy function on C x has the form H x (a x ,a x +i) = a x G x (a x +i/a x ) with G x 
concave. The associated additive scoring rule is 

ct D^ n> ( Px \ i n f Px+l\ Px+l n / ( Px+l\ , n , n 

p) = G ^ ) +Gx {—)- — G * lir ) {x = °' lj • • 

with the first term absent if x = 0. The total score based on a sample 
(xi , . . . , x n ) in which the frequency of y is f y (y = 0, 1, . . .) is thus 

oo 

^2fyGy(Vy) + {fy+l - fyVy)G' y {Vy) 

y=0 

with Vy := Py+i/py. If, for example, we wished to fit the Poisson model 
p x oc 6 X jx\, we could estimate 9 by minimizing the total empirical score 
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Taking G x (v) = — (x + l) a v m /m(m — 1) for m 7^ 0, 1, we obtain the unbiased 
estimating equation 



yielding a simple explicit formula for the estimate. When m = a, we recover 
the maximum likelihood estimate 9 = x. 

Example 4.2 (Capture-recapture). Consider the following experiment 
performed to estimate the number N of fish in a lake. On c consecutive 
occasions we catch a fish, at random, and then replace it. When a fish is 
first caught it is given a unique tag, so that it can be recognized on recapture. 
Each fish i = 1, . . . , N in the lake has an associated random variable Xi, the 
number of times it is caught. For large c and N we can approximate the 
distribution of Xi by the Poisson distribution with mean 9 = c/N . We will 
know f x , the number of fish caught x times, for x > 0, but not fo, the 
number of fish never caught. The observed data thus arise from a truncated 
Poisson distribution for X, conditioned on X > 0. If we can estimate 9 
we can estimate N = c/9. However, because of the need to work with the 
normalization constant of the truncated Poisson distribution, the maximum 
likelihood estimate of 9 cannot be expressed in explicit form and must be 
determined numerically. 

Homogeneous local scoring rules can be used to avoid the normalization 
constant problem and obtain an explicit estimate. We simply modify the 
above analysis of the full Poisson model by removing the edge 0-1 from the 
neighborhood graph G, together with its associated local entropy function. 
Equivalently, we redefine Go = 0. With the other explicit choices made above, 
the resulting estimating equation is given by (10) but with the sums now 
over y > 1 . For m = a we obtain 



where n = ^2 x> i fx is the number of different fish caught. Note that c — f\ 
is the number of times a catch yields a fish which is already marked. In 
comparison the maximum likelihood estimate 9 satisfies 




so fi in (11) is replaced by the estimate of its expectation E^iq) = ne~ e . 

In the interests of robustness we might also omit other data, and again 
this is easily done. For example, let the only edge in G be 1-2; equivalently, 
we take G x = for x 7^ 1. Then, as well as fo, the counts fy, f^, . . . are also 



(10) 
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excluded, only the terms for y = 1 remain in (10), and we obtain the robust 
Zelterman estimate [Zelterman (1988)]: 




4.2. Local scoring rules for product spaces. Suppose our discrete sample 
space is itself a product space, X = X\ x Xi X • ■ • X X\.. A point x of X has the 
form x = (xi, . . . ,Xk). We can define a useful symmetric neighborhood rela- 
tion on X by x-y if, for some i, aA* = y\\ where aA l := (xj A maximal 
clique of the associated graph Q is then defined by an index i € {1, . . . , k} 
and a vector £\ l € X\ l := X ^ Xj, and has the form C i ^\ r := {x : aA* = S} 1 }. 
Within such a clique, only the value of Xi can vary, over the space Xi. 

We can introduce, for such a clique C = C i £\< , a 1-homogeneous concave 
function He of etc. Its gradient will determine a 0-homogeneous proper 
scoring rule Sc(x,a), vanishing unless x € C, that is, x^ 1 = ^ J , and in this 
case depending on a only through ac- I n particular, Sc(x, P) depends on P 
only through the implied conditional distribution P(- = for X,,, 
given X\ l = £\ l . 

Conversely, any proper scoring rule defined for outcomes in Xi and dis- 
tributions over Xi can be applied to the observed value xi of Xi and the 
conditional distribution P(- \X^ 1 = (and taken as if x^ 1 ^ when 
denormalized, this will be of the above form. The general (/-additive scoring 
rule can then be formed by aggregating a collection of such single-clique 
component scoring rules: 

(12) S(x, P) = Y^ Sc{xi,P(- \X\* = £ V )}1(^ = 

c 

4.2.1. Specialization. Although it is allowable that the form of the com- 
ponent scoring rule Sc in (12) might vary with the conditioning values £\* 
that, together with the index i, determine the clique C, this level of gener- 
ality will rarely be needed, and we might thus restrict attention to proper 
scoring rules of the form 

k 

(13) S(x, P) = J2 Si{xi,P(- \X\* = x\% 

i=i 

where Si is a proper scoring rule for variables in, and distributions over, Xi. 
The associated discrepancy function is 

d(P,Q) = Y,Vx~pdi{P(- \X\*),Q(- \X\% 

i 



where di is the discrepancy function associated with Si. 
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Recall that we are assuming that P, Q are everywhere positive distribu- 
tions. If now each Si is strictly proper, then d(P, Q) = if and only if, for 
all i and x\\ Q{- \X^ = aA*) = P(- \X^ = X s ?) . But (with strict positivity) 
this can only occur if Q = P, so S is strictly proper. 

4.2.2. Markov models. Suppose now that we have an undirected graph K 
with vertices {1, . . . , k}, and we restrict attention to distributions P that are 
Markov with respect to K,. Any component score Sc in (12), or Si in (13), 
will then depend only on the value Xi of Xi, and the conditional distribution 
of Xi given the neighbors X nc ^ of i in K,. It is possible to calculate this 
conditional distribution without having access to the normalizing constant 
of the overall distribution P, which is often hard to compute. In particular, in 
the estimation context described in the Introduction, this can greatly ease 
construction and solution of the unbiased estimating equation associated 
with this scoring rule. 

A prominent example of a scoring rule of this kind is the pseudo-likelihood 
function introduced by Besag (1975): 

Example 4.3 (Pseudo-likelihood). When every component scoring 
rule Si in (13) is the log score, the overall rule will be just the negative 
logarithm, S(x,P) = — logPL(P, x), of the pseudo-likelihood function, de- 
fined, for a joint distribution P and outcome vector X ^ clS 

PL(P.x) := YlP(X i = x i \X\ i =x\ i ) 
i 

(where in the context of a Markov model the conditioning variables X Si can 
be reduced to X ne ^ ) . Hence general properties of proper scoring rules can 
be applied to pseudo-likelihood. In particular, a maximum pseudo-likelihood 
estimator will typically be consistent under independent and identically dis- 
tributed repetitions (though this argument does not address consistency 
under increasing dimension k, which is more relevant in many applications 
of pseudo-likelihood). 

Replacing the log score with the Brier score leads to the method of ratio 
matching [Hyvarinen (2007)]. 

Example 4.4 (Ratio matching). For the case Xi = {0,1}, take every 
component score Si in (13) to be the Brier score, leading to the overall 
scoring rule 



(14) 



S(x, P) = - P{X t = 1 \X* = x*)} 2 . 

i 
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For a parametric model Q = {Qg : 9 E 0}, we could estimate 8 by minimizing 
'Y^,iS{xi,Qo). This would equivalently minimize the empirical discrepancy 
d(P n ,Qe), where 

k 

d(P,Q) = £ £ P(X\ i = ^ i ){P^\x i = 1) - Q^(X* = I)} 2 
i=i |\»e{o,i} fc ^ 1 

with P^\Xi = 1) = P(Xi = = etc. This can be shown to agree 

with the more complex formula (13) of Hyvarinen (2007). 2 

5. Characterizing local scoring rules. Any positive linear combination of 
the log-score — Xlnp x and a (/-additive score of form (8) will be (/-local. We 
now develop a converse to this result, assuming henceforth that S(x,P) is 
continuously differentiable on V . Under additional conditions on the neigh- 
borhood relation A/", we show that any proper such local scoring rule must 
be (/-local for a suitably defined graph Q and equal to a positive linear 
combination of the log-score and a 0-homogeneous (/-additive score. 

We say x is related to y and write x ~ y, if x, y E N z for some neighborhood 
N z E A". Let p(x) := {y : y ~ x} denote the set of relatives of x. Consider now 
the following condition on the neighborhood system M: 

Condition 5.1. There exist y\,y2 E X such that, with pi := p(yi): 

(15) pi np 2 = 0, 

(16) Pl Up 2 ^X. 

Note that in the special case of the trivial neighborhood system A/o, that 
is, N x = {x} for all x, this condition is equivalent to the condition j^X > 2, 
as required for the log score to be the only proper A/"o-local scoring rule; see 
Section 3.2. 

Assume now that S is a proper scoring rule. For fixed P € V, S(P, Q) is 
then minimized in Q, subject to Q E V, at Q = P. Introducing, for each P, 
a Lagrange multiplier X(P) for the constraint q x = 1, we must thus have 

(17) S2p x ^-S(x,P) + X(P) = for all yeX. 

In the case of a 0-homogeneous proper local scoring rule, we could without 
loss of generality assume that the neighborhood system M = {N x ,x E X} 
was determined by an undirected graph Q. In general this is not necessarily 
the the following example shows. 



2 The further analysis in that paper does not agree with our (14), and appears to contain 
some errors. 
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Example 5.1. A simple example that does not satisfy the condition is 
the neighborhood system determined by the undirected graph 1-2 3, where 
p(l) = p(2) = {1, 2}, p(3) = {3}. For this graph we can define a scoring rule 
as follows: 



Then 5 is (/-local, and can easily be shown to be proper (it is an affine 
transformation of the Brier score for the event X = 3). However, its 0- 
homogeneous extension is 5(1, at) = 5(2, a) = (0:3 / 'a+) 2 , 5(3, a) = {(ai + 
a,2)/a + } 2 , where a + := a.\ + «2 + 03 • Thus the 0-homogeneous extension 
of 5 is not (/-local, and in particular not (/-additive. 

For neighborhood systems M which satisfy Condition 5.1 we have the 
following lemma. 

Lemma 5.1. Suppose S is proper and J\f -local. If Condition 5.1 holds, 
then \{P) satisfying (17) is constant on V. 

Proof. For AA-local 5, condition (17) gives for any y G X: 



as dS(x, P)/dp y = unless y G N x . For any term in the sum in (18), 5(x, P), 
and thus dS(x, P)/dp y , depends only on pa^, hence, since y £ N x , only on 
Pp( y ) = {p z '■ z £ Pill)}- Taking y = yi, this implies that A(P) depends only 
on pi := {p z : z G p±}; similarly, X(P) depends only on P2 := {p z : z G p^\- 

By (16) we can take w G X \ (pi Up2)- Starting at p, consider a change 5p\ 
to pi, such that p x + Sp x G (0, 1) (x G pi), and 5p^ := J2 x e Pl S Px G (Pw ~ ^ 
p w ). Extend the variation 5pi to the whole of p by 5p w = —5pf, 5p x = 
(all other x). Then p + 5p G V. Since A(p) depends only on P2, which has 
not changed, A(p + <5p) = A(p). But we can also express A(P) as A*(pi), 
whence A* must be constant in an open neighborhood of pi. It follows 
that A(P) is constant on V . □ 

We now have that, under Condition 5.1, any M- local proper scoring rule 
must satisfy: 



S(l,P) = S(2,P) = (l- Pl - P2 ) 2 , 
5(3,P) = (l-p 3 ) 2 . 



(18) 




{x:y£N x } 



Px 



dS(x,P) 
d Py 



For all P G V and all y G X 



(19) 




for some scalar A G M. 
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We note that a particular TV-local solution of (19) is given by the log- 
score, S(x,P) = — Xlnp x . Because (19) is linear, the general solution is thus 
S = —A In ^ + So, where, for all P G V and all y G X, So satisfies the key 
equation: 

(20) ^p 1 .a/ap ?/ 5(x,p) = o. 

X 

We thus can, and henceforth shall, restrict attention to such key local scoring 
rules. We next show that the 0-homogeneous extension of a key local scoring 
rule is (/-additive for a suitable undirected graph Q. 

Let H(P) := ^2 x p x S(x, P) be the associated entropy function. Then (20) 
implies S(y,P) = dH(P)/dp y . It follows that 

(21) dS(x,P) _ dS(y,P) _ 

dp y dp x 

Hence if y G N x but x <£N y , dS(x, P)/dp y must nevertheless vanish. Let Q 
be the undirected graph in which x and y are neighbors if both x G N y and 
y G N x . We call Q the symmetric core of Af. Then any key JV- local proper 
scoring rule must in fact be (/-local. So we henceforth confine attention to 
(/-locality for an undirected graph Q and assume that the neighborhoods are 
determined by Q as N x = {x} U bd(x). 

Lemma 5.2. Under Condition 5.1, if S is a key Q-local scoring rule, its 
0-homogeneous extension is Q-local. 

Proof. With P = a/a+ we obtain by differentiation that, for any y, 

dS( X ,Cx) = 1 f dS(x,P) d \ 1 dS{ X ,P) 

da y a + \ dp y x % ^P* ' / a + ^Py 

as (20) and (21) imply that the second term within braces vanishes. Hence 
the result follows. □ 

Before we proceed to show that under Condition 5.1, any key (/-local 
scoring rule is (/-additive, the following result is useful. 

Lemma 5.3. Suppose x-f-y, Condition 5.1 holds, and S is key Q-local. 
Then its entropy function H satisfies 

da x da y 

PROOF. In this case, by Theorem 2.4, d 2 H (a) / da x da y = dS(x,a)/ 
da y = by Lemma 5.2. □ 

The following lemma is straightforward: 
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Lemma 5.4. Let f{a) be a twice continuously differentiable function. 
Then d 2 f (a) / da x da y = if and only if 

(22) f(a) = f(ot w , a x ,a* y ) + f(a w , a* x ,a y ) - f(at w , ol%, a*) 
for some (and then any) values a*, a*, where W := X \ {x,y}. 

We now proceed to establish C?-additivity for any key (/-local scoring rule: 

Theorem 5.5. Let C be the set of maximal cliques of the graph Q sat- 
isfying Condition 5.1, and let S be a key Q -local scoring rule. Then we can 
express the scoring rule and associated entropy function as 

(23) S(<x) = J2sc(ac), H{cx) = Y,hc{occ)- 

cec cec 

Further, if H is regular, each term he in the expansion can be chosen to be 
1 -homogeneous. 

Proof. The proof parallels that of the Hammersley-Clifford theorem as 
given in Grimmett (1973); see also Lauritzen (1996), page 36. For a fixed a* 
and all subsets iCA'we define 

(24) Va(* a ) = H(a A ,a* XXA ) 

and note that then H(a) = ?^(ct). Next, we define for all B C X 

(25) h B (a B )= Yl (-l) lBXAl VA(<x A ). 

A : ACB 

The Mobius inversion formula [see, e.g., Lauritzen (1996), page 239] then 
yields: for all A C X, 

r] A (a A )= h B (a B )- 
B-. B CA 

Thus, taking A = X, we have established 

(26) H(a)= ^ h B (a B ). 

B -. B <zx 

We next show that all terms h B in (26) vanish unless B is a complete set 
and (23) then follows by collecting appropriate terms. So suppose there 
exist x,y £ B with x/y. We then let D = B\ {x, y} and write the expression 
in (25) as 

h B {a B )= ^2 {-l) lD ^ Al (riA-riAu{x}-riAu{y}+VAu{x,y}), 

A-.ACD 
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where we have abbreviated rja '■= r]u{ a u) = H(au,a* A 



x\u 



), etc. But each of 



the terms in this expansion vanishes by Lemma 5.4 and hence hs vanishes 
as required. 

If H is regular, we can choose a* = for all x £ X so each function tja 
in (24) and hence each Kb in (25) will be 1-homogeneous. 

This establishes the desired expansion of the entropy function. The ex- 
pansion for the scoring rule is obtained by forming gradients. □ 

It does not seem to be true in general that each term in (23) can also 
be chosen to be concave. If this is indeed the case, S and H are built up 
additively from proper scoring rules and entropy functions defined on cliques. 

6. Summary and discussion. We have defined a proper scoring rule S(x, 
P) = s(x,p) for discrete sample space X to be local relative to a neigh- 
borhood system M = {N x } x€ x if each s(x,p) only depends on p through 
its restriction pjv a to N x , and shown how to construct such scoring rules 
from additive components. Conversely, we have shown that under appropri- 
ate regularity conditions any proper local scoring rule has this structure, 
although the additive components may not in general each correspond to 
proper scoring rules. 

A definition of homogeneous local scoring rule for a general well-behaved 
topological outcome space X that would unify the discrete and continuous 
case would be to say that a scoring rule is homogeneous and local if it 
satisfies 



for every open neighborhood N x of x and every x € X . 

Clearly, the homogeneous local scoring rules investigated in Parry, Dawid 
and Lauritzen (2012) satisfy this requirement. It would be interesting to 
obtain a complete characterization of proper scoring rules satisfying (27). 

Acknowledgment. We are grateful to Valentina Mameli for comments on 
an earlier version of this article. 
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